Modelling potential environmental impacts of science activity in Antarctica
Abstract
We use GPS data collected on a science expedition in Antarctica to estimate hiking functions for the speed at which humans traverse terrain differentiated by slope and by ground cover (moraines and rock). We use the estimated hiking functions to build weighted directed graphs as a representation of specific environments in Antarctica. From these we estimate using a variety of graph metrics—particularly betweennness centrality—the relative potential for human environmental impacts arising from scientific activities in those environments. We also suggest a simple approach to planning science expeditions that might allow for reduced impacts in these environments.
1 Introduction
Overview of Antarctic science: when and where, its intensity etc. Background on international treaties, etc.
Relevant findings as to human impacts in Antarctica. Note that in this environment even ‘leave only footprints’ is likely impacting the environment in significant ways.
Overview of sections ahead.
3 Data sources
3.1 Antarctic geospatial data
Geospatial data for Antarctica were obtained from sources referenced in Cox et al. (2023b), Cox et al. (2023a), and Felden et al. (2023). The study area was defined to be the Skelton and Dry Valleys basins, as defined by the NASA Making Earth System Data Records for Use in Research Environments (MEaSUREs) project (Mouginot and University Of California Irvine 2017) and shown in Figure 3 (a). The Skelton basin was included because while the expedition GPS data was ostensibly collected in the McMurdo Dry Valleys, it actually extends into that basin as shown in Figure 3 (b). Elevation data from the Reference Elevation Model of Antarctica (REMA) project Howat et al. (2022), and geology from GeoMAP Cox et al. (2023b) are shown in Figure 3 (c). The five largest areas of contiguous non-ice surface geology across the study area shown in Figure 3 (d) were chosen to be the specific sites for more detailed exploration using the methods set out in this paper. These range in size from around 320 to 2600 square kilometres.
3.2 GPS data from an expedition
FRASER: Timeline, devices used, and associated protocols for scientists while on site.
GPS data were processed to make them better suited for use in the estimation of hiking functions.
The first processing step was to confirm the plausibility of the data, particularly the device-generated speed distance between fixes, and elevations associated with fixes. The challenges of post-processing GPS data are well documented and relate to issues with GPS drift which can lead to estimated non-zero movement speeds as a result of noise in the signal. The raw GPS data included distance since last fix, speed, and elevation estimates and it was determined in all cases that the device generated results for these measurements were likely to be more reliable than post-processing the raw latitude-longitude fixes to calculate the values.
The second processing step was to remove fixes associated with faster movement on other modes of transport than walking. Wood et al. (2023) cite a number of previous works that base detection of trip segments based on recorded speeds. This method was trivially applicable to our data to a limited degree as scientists arrive at the expedition base camp and occasionally also travel on helicopters on trips to more remote experimental sites.
The third, more challenging processing step was to deal with sequences of fixes associated with non-purposeful movement when scientists were in or around base camp, at rest stops, or at experimental sites. Crude filters removed fixes with low recorded distances between fixes (less than 2.5 metres), high turn angles at the fix location (greater than 150°), and fixes recorded on ice-covered terrain, but this didi not clean the data sufficiently for further analysis. An additional filtering step was to count fixes (across all scientists) in square grid cells and remove all fixes in grid cells with more than 50 fixes.
This left one persistent concern: an over-representation of consecutive fixes recorded at exactly the same elevation, resulting in many fixes with estimated slopes of exactly 0, and leading to a clearly evident dip in estimated movement speeds at 0 slope (Figure 4 (a)). It is likely that these fixes are associated with GPS device drift, so a it was decided to remove all fixes where estimated slope was exactly 0. Figure 4 (b) shows the improvement in even a crudely estimated hiking function derived from local scatterplot (LOESS) smoothing. Note that such functions are likely overfitted and not used further in our analysis where we favour more easily parameterised functions such as those discussed in Section 2.1.
4 Methods and results
4.1 Hiking functions
We fit three alternative functional forms to the cleaned GPS data: exponential (Tobler 1993), Gaussian (following Irmischer and Clarke 2018), and Lorentz (following Campbell et al. 2019) using the Levenburg-Marquardt algorithm (Moré 1978) as provided by the nlsLM function in the minpack.lm R package (Elzhov et al. 2022). The raw data and fitted curves are shown in Figure 5.
The Lorentz function offers a marginal improvement in the model fit in comparison with the Gaussian function, while both are clearly better than the exponential form. However, the improvement offered by the Lorentz function over the Gaussian is marginal: residual standard error 1.489 vs 1.491 on Moraine, and 1.487 vs 1.488 on Rock, and inspection of the curves shows that estimated hiking speeds for the Gaussian functions are much closer to a plausible zero on very steep slopes. We therefore chose to adopt Gaussian hiking functions for the remainder of the present work.
In previous work researchers have applied a ground cover penalty cost to a base hiking function to estimate traversal times. We instead, as shown, estimate different hiking functions for the two ground cover types present. The peak speed on rock is attained on steeper downhill slopes than on moraines, perhaps indicative of the greater care required on downhill gravel slopes. Meanwhile the highest speeds on level terrain are attained on moraines.
Overplotting of the hiking functions including an additional model fitted to all the data, confirms that the fitted functions are sufficiently different to retain separate models for each ground cover (see Figure 6). Plotting both functions in the same graph makes clearer the difference in maximum speed and slope at maximum speed associated with each ground cover.
The estimatedhiking functions associated with the two landcovers are \[ \begin{array}{rcl} v_{\mathrm{moraine}} & = & 4.17\,\exp\left[{-\frac{(\theta+0.0526)^2}{0.236}}\right] \\ v_{\mathrm{rock}} & = & 3.76\,\exp\left[{-\frac{(\theta+0.119)^2}{0.365}}\right] \end{array} \tag{3}\]
where the different maximum speeds (in kilometres per hour) and slopes of maximum speed are apparent.
4.2 Landscapes as graphs
We developed R code (R Core Team 2024) to build graphs (i.e. networks) with hexagonal lattice structure and estimated traversal times for graph edges derived from our hiking functions. Graphs are manipulated as igraph package (Csárdi and Nepusz 2006; Csárdi et al. 2024) graph objects for further analysis. An important decision in constructing graphs is choice of the spacing of the hexagonal lattice, and also of the underlying DEM from which graph vertex elevations are derived. Given the extent of the study sites (see Figure 3 (d)) it was decided that a hexagonal lattice (see Figure 2) with hexagons equivalent in area to 100 metre square cells as appropriate. The hexagon centre to centre spacing of this lattice is given by \[
100\sqrt{\frac{2}{\sqrt{3}}}\approxeq 107.5\,\mathrm{metres}
\tag{4}\] Given this lattice resolution we interpolated vertex elevations from a 32m resolution DEM from the REMA project (Howat et al. 2022) by bilinear interpolation using the R terra package (Hijmans 2024). It would be straightforward to derive vertex elevations from a more detailed DEM if required.
Edge weights (i.e. estimated traversal costs) are assigned by calculating the slope of each directed edge, the estimated hiking speed for that slope, and thus finding how long it should take for an edge to be traversed. If the estimated traversal time of an edge in the nominal 100 metre lattice is greater than 30 minutes then it is removed from the graph, along with its ‘twin’ edge in the opposite direction. Removing edges in both directions is partly a practical matter as it simplifies the operation of many graph algorithms, but can also be justified on the basis that a slope steep enough to be a barrier to ascent is unlikely to be traversed when descending.
After removal of all such edges only the largest connected graph component is retained so that the resulting hiking network representation is fully connected with no isolated vertices unreachable from elsewhere in the network remaining. A map of the fifth largest study area’s hiking network is shown in Figure 7. This network includes 30,697 vertices and 174,798 directed edges. The largest of the five study sites (see Figure 3 (d)) results in a network containing almost a quarter of a million vertices and over 1.4 million directed edges. A hiking network can be used to explore many connectivity properties of the environment. For example, for a chosen origin point, a shortest path tree can be derived showing the route (Figure 7 (c)).
4.3 Betweenness centrality limited by radius
A particular graph connectivity property we have used to reveal the relative likelihood of different parts of a terrain being frequently traversed is betweenness centrality. In a graph \(G=(V,E)\) a path \(P\) is an ordered sequence of vertices: \[ P=\left(v_1,v_2,\ldots,v_n\right) \] {$eq-path} such that each consecutive pair of vertices \(v_i\) and \(v_{i+1}\) is adjacent, i.e. connected by a directed edge \(e_{i,j}\). The simple length of a path is the number of edges it contains. The weighted length of a patch is the sum over its edges of a weight or cost associated with each edge, which we denote \(w_{i,j}\), that is the length \(L\) of a path \(P\) is given by \[ L(P)=\sum_{i=1}^{n-1} w_{i,i+1} \tag{5}\] The shortest path from \(v_i\) to \(v_j\) is the path \(P=\left(v_i,\dots,v_*,\ldots,v_j\right)\) such that \(L(P)\) is minimised. In a regular lattice such as our hiking networks there are many equal length shortest paths between any pair of vertices. When we consider edge weights, then the shortest path will be unique, or one of only a small number of possibilities of equal length.
Shortest paths are used to develop many measures of graph centrality (Freeman 1978). For example, the mean shortest path length of a vertex is the mean of all the shortest path lengths from that vertex to every other vertex in the graph. A vertex with low mean shortest path length might be considered more central than another a vertex with a higher mean shortest path length.
One measure of vertex centrality in a graph is betweenness centrality. The betweenness centrality of a vertex is the total number of times a vertex is on shortest paths between every other pair of vertices in the network. When the number of shortest paths or geodesics between two vertices \(v_j\) and \(v_k\) is \(g_{jk}\), each appearance on those shortest paths of a vertex only contributes \(1/g_{jk}\) to that vertex’s betweenness centrality. Formally, if we denote the number of times \(v_i\) appears on shortest paths between \(v_j\) and \(v_k\) by \(g_{jk}(v_i)\), then we can write this as \[ \mathrm{betweenness}(v_i)=\sum_{k=1}^n\sum_{j=1}^n\frac{g_{jk}(v_i)}{g_{jk}}\forall i\neq j\neq k \tag{6}\] Betweenness centrality is particularly relevant to applications where we are interested in graph vertices that control movement through the graph. An edge betweenness centrality measure can also be calculated on similar principles, but is not considered further here.
As might be expected betweenness centralities are computationally demanding to calculate. The time complexity of early algorithms was \(\mathcal{O}(n^3)\) where \(n\) is the number of vertices in the graph (Brandes 2001). An implementation of Brandes’s algorithm (2001) which has time complexity \(\mathcal{O}(nm + n^2\log n)\) where \(m\) is the number of edges in the graph, is provided in the igraph package (Csárdi et al. 2024). Even with this improvement in performance, in our application such computational complexity is a strong motivation for working at a nominal 100 metre resolution. Halving the resolution to 50 metres would increase the number of graph vertices four-fold, and lead to a substantial increases in the times taken to calculate betweenness centralities.
Fortunately, the igraph implementation of betweennees centrality provides an option to radius-limit between calculations, meaning that only paths shorter than a specified radius are considered in the counting of appearances on shortest paths. While ‘all graph’ betweeness centralities can be standardised relative to a theoretical maximum of \((n-1)(n-2)\), not straightforward standardisation is possible for the radius-limited case. Since we are primarily interested in betweenness centrality as a measure of the relative vulnerability of different locations to human impacts, we linearly rescale measured betweennees measures with respect to the results of a given analysis for visualisation purposes.
4.4 Impact minimizing networks
Tentative proposal for impact minimizing networks based on minimum spanning trees, but noting the issue with respect to directed graphs when these would more correctly be arborescences (Korte and Vygen 2018).
5 Discussion
Blah
6 Conclusions
Blah